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Abstract 

This paper explores the mixing time of the random transposition walk 
on the symmetric group S„. While it has long been known that this walk 
mixes in 0{n log n) time, this result has not previously been attained using 
coupling. A coupling argument showing the correct order mixing time is 
presented. This is accomplished by first projecting to conjugacy classes, 
and then using the Bubley-Dyer path coupling construction. In order to 
obtain appropriate bounds on the time it takes the path coupling to meet, 
ideas from Schramm's paper "Compositions of Random Transpositions" 
are used. 

1 Introduction 

This paper studies the random transposition walk on the symmetric group Sn 
- in card shuffling terms, the possible permutations of a deck of n cards. Here's 
a description of the random walk: lay n cards out in a row, and pick one card 
uniformly with your right hand, and another card independently uniformly with 
your left hand (note that you may have picked the same card.) Then, swap the 
cards - this is an extremely simple shuffling scheme for n cards. 

Below, we study the mixing time of the above random walk: that is, the 
number of shuffles that it takes to thoroughly mix up the deck (see Section [5] for 
a precise definition.) To be more specific, a coupling argument demonstrating 
that the mixing time of the random transposition walk is 0{n log n) is presented. 
Coupling is an intuitive probabilistic technique that bounds mixing time in the 
following way: define a process {Xt,Yt)t>o such that both {Xt)t>o and (lt)t>o 
are Markov chains with the same transition matrix, but with Xt starting at x 
and Yt starting at y. As will be described more precisely in Section [5] below, the 
goal is to have the two chains meet: by the time that this has happened with high 
probability for every choice of x and y, it can be shown that the Markov chain has 
mixed. This technique is usually traced back to Doeblin [9j ; two good reference 
books which illustrate its many uses are Lindvall's "Lectures on the coupling 
method" [15] and Thorisson's "Coupling, stationarity, and regeneration" [20]. 

The existence of a coupling argument showing an O(nlogn) mixing time is 
a long-standing open problem. Due to its simplicity and symmetry, the random 
transposition walk was one of the first ones considered in burgeoning field of 
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Markov chains mixing times. As noted in [S], the mixing time of this walk was 
first bounded by Aldous in 1980, who showed that it must be between order n 
and and conjectured that it must be of order n log n. This was proved in 1981 
in "Generating a random permutation with random transpositions" by Diaconis 
and Shahshahani [S]. This paper uses Fourier analysis to show that the walk 
experiences a cut-off, mixing in a window of order n around time ^nlogn. 

The beautiful proof in [S] uses the tools of representation theory and Fourier 
analysis, and hence is non-probabilistic. While a purely probabilistic strong 
stationary time proof for an 0(n log n) mixing time was discovered by Broder 
in 1985 [3], a coupling argument proved to be more elusive. The main difficulty 
is due to the fact that a Markovian coupling cannot succeed; indeed. Lemma [5] 
below shows that such an approach can never prove a bound of order better than 
n^. It has been shown by Griffeath [10] and then Pitman [18] that a maximal 
coupling must exist, but it evidently has to be non-Markovian. There has been 
continued interest in finding such a coupling - for example, Peres named it as 
an interesting open problem in [17! ■ This paper resolves this problem. (Another 
approach for finding such a non-Markovian coupling can be seen in the preprint 
"Mixing times via super- fast coupling" [13].) 

This question is approached here by first projecting the random transposition 
walk to conjugacy classes. T his projection is also a Markov chain, called a split- 
merge random walk [19]. Using the fact that the random transposition walk 
started from the identity is constant on conjugacy classes, it suffices to find the 
mixing time of the split-merge random walk. The path coupling technique of 
Bubley and Dyer |4] is used to examine the split- merge random walk. However, 
this is not straightforward, since in the worst case scenario, the split-merge 
random walk contracts by only 1 — ^ . 

It is shown here that on average, the split-merge random walk does indeed 
contract by 1 — i, enabling the use of path coupling to conclude that the walk 
mixes in 0{n\ogn) time. This argument does not, however, show cut-off: in- 
deed, as noted in Remark [37l below, the constant in front of the nlogn is very 
large. To show that the contraction coefficient is of the right order, the tech- 
niques of Schramm from his paper "Compositions of random transpositions" 
|19j are used. He shows that large cycles in the random transposition walk 
emerge after time ^, and then proves the law for the scaled cycles. Methods 
from "Compositions of random transpositions" have given rise to the wonderful 
paper "Mixing times for random k-cycles and coalescence- fragmentation chains" 
by Berestycki, Schramm, and Zeitouni [2], which uses probabilistic techniques 
to get the right answer for a generalization of the random transposition walk. 

2 Background and Definitions 

Before stating the main result of this paper, a number of definitions are nec- 
essary. If fi and ly are two probability distributions on a finite state space fi, 
then the total variation distance between fj, and v is defined to be Wfi — i^Wxv ~ 
i X^sGf! ~ ^(2^)1- For a Markov chain with transition probabilities Q{x, y) 



2 



and stationary distribution tt, the total variation distance at time t is defined 
to be d(t) — ||(5*(x, •) — Triljny and the mixing time is 

Tniix(e) = min{i | d{t) < e} 

Conventionally, r^j^ is defined to be rmix(l/4). 

A coupling of a pair of Markov chains both with transition matrix Q is a 
process {Xt,Yt)t>o such that both {Xt)t>o and {Yt)t>o are Markov chains with 
transition matrix Q, but which might have different starting distributions. The 
coupling inequality (Corollary 5.3 in [M]) states that if {Xt,Yt) is a coupling of 
a pair of Markov chains such that Xq — x and Yq = y, and T^^y is a random 
time at which the chains have met, then 

d{t) < maxP{Ta; y > t} 

The above inequality allows coupling to be used to bound mixing times. It is 
now possible to state the main result of this paper: 

Theorem 1. There exists a coupling argument that shows that the random 
transposition walk on Sn mixes in time of order nlogn: that is, it demonstrates 
that there exists a constant C such that 

Tmix < Cn log n 

Before launching into the proof, it is instructive to consider the many ways 
an O (nlogn) mixing time has been obtained for this walk, as well as the uses 
of the result. This bound was first obtained by Diaconis and Shahshahani in 
[5]. This result is beautiful and extremely precise; however, the scope of the 
technique is limited as it requires fully diagonalizing the random walk. While 
this is possible for a number of walks, including walks that are not random 
walks on groups, this is a drawback to the method. This result is also extremely 
useful for comparison theory. As shown by Diaconis and Saloff-Coste in [7J , the 
Dirichlet form can be used to compare all the eigenvalues of the chain, resulting 
in good bounds for a variety of walks. For example, Jonasson uses this result 
in [12] to show that the overlapping cycle shuffle mixes in 0{n^ logn) time. 

As noted above, the first probabilistic proof of the result was by Broder [3] 
and used strong stationary times: stopping times T such that the conditional 
distribution of Xt given T is stationary. Since the stationary distribution for 
the random transposition walk is uniform, this is equivalent to stating that for 
all a G Sn and all positive integers fc, 

¥(Xt = ct I T = /c) = ^ 
n! 

The following is Broder's strong stationary time argument, as summarized in 
Chapter 9 of [14J. Let Rt and Lt be the cards chosen by the right and left hand, 
respectively. Start the process with no marked cards, and use the following 
marking scheme: at each step, mark a card Rt if Rt is unmarked, and either (a) 
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Lt is marked or (b) Rt = Lt. Define the stopping time T to be the first time ah 
n cards are marked. It is easy to show that this is indeed a strong stationary 
time, and that T is around 2nlogn. This argument provides an 0(n log n) 
mixing time, but not the correct constant. It was improved by Matthews [TB] in 
1988 by creating a more comphcated rule for marking the cards. This argument 
showed a cut-off for the walk at time log n. These arguments are probabilistic 
and intuitive, and elucidate the reasons for the mixing time in a way that Fourier 
analysis does not. However, they are heavily reliant on the symmetry of the 
random transposition walk and as such are difficult to generalize. 

The recent paper by Berestycki, Schramm and Zeitouni [2] uses a different 
approach. Their technique provides the correct answer for the following gen- 
eralization of the Markov chain: instead of using a uniformly chosen random 
transposition at each step, a random /c-cycle is used. This paper obtains the 
correct ^nlogn answer for any fixed k. Like this paper, they begin by project- 
ing the walk to conjugacy classes and then make use of the results of Schramm 
in [in]. The tools of both this result and Schramm's original paper are graph 
theoretic: for example, a transposition is considered to be an edge in a ran- 
dom graph process on n vertices. Unfortunately, this exciting method again 
requires considerable symmetry, since the projection to conjugacy classes has to 
be a Markov chain. This is also a drawback of the coupling approach which is 
presented here. 

Another intriguing technique explored by Burton and Kovchegov |13) uses 
non-Markovian coupling. While I have found the ideas in this paper difficult, 
the approximate approach is that the standard coupling argument by Aldous 
which results in 0{n^) bound can be improved by 'looking into the future.' A 
non-Markovian argument with a somewhat similar flavor has previously been 
implemented for the coloring chain by Hayes and Vigoda [TT]. Here's a very 
approximate sketch of the idea for random transpositions: say that a pair (a, r) 
in Sn currently differs in the cards labeled i and j. The standard coupling for 
this pair transposes the cards with the same labels in both a and r, unless the 
next transposition is However, it is possible to do something different: 

if the next step transposes cards labeled i and k in a, the next step in t can 
transpose either cards labeled i and k or cards labeled j and k. If the coupling 
is Markovian, then the choice makes no difference; however, 'looking into the 
future' can substantially improve the bounds. In work stemming from an unre- 
lated project, I hope to show this for a number of different walks in an upcoming 
paper. 

The argument in this paper proceeds by projecting the walk to conjugacy 
classes. It is a well-known result that the conjugacy classes of Sn are indexed 
by partitions of n. Recall that a partition of n is an m-tuple (ai,a2, . . . ,am) 
of positive integers that sum to n, where m can be any integer, and ai > 02 > 
■ • • > flm- Let Pn be the set of partitions of n. The projection of the random 
transposition walk on Sn to conjugacy classes is also a Markov chain, called a 
split-merge random walk. It is often referred to as a coagulation-fragmentation 
chain, and it has been extensively studied - see [6] for some references. 
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Definition 2. Assume the random walk is currently at partition (ai, . . . ,a,„). 
Then, there are three possibilities for the next move: either merge a pair of parts, 
split a part into two pieces, or stay in place. (All of these moves are followed by 
rearranging the new parts to be in non- decreasing order.) 

• Split: A pair can be replaced by the pair (r, ai — r). For each r between 
1 and — 1, the probability of this particular split is 

Note that this phrasing takes the order into account: here, a more conve- 
nient phrasing is the following: for each r < split Oi into {r, — r} 
with probability If Oi is even and r = split Oi into {r, a.^ — r} with 
probability 

• Merge: Replace the parts Oi and aj by a.i + Oj. This is done with proba- 
bility ■' . 

• Stay in Place: Stay at the partition (ai, 02, ... , Om) with probability i. 

Example 3. Here is an example of the split-merge random walk. Let n = 5, 
and assume the walk is currently at (4, 1). Then, the next step Xi is distributed 
as follows: 

(5) with probability ^ 

(4,1) with probability ^ 
(3, 1, 1) with probability ^ 
(2, 2, 1) with probability ^ 

The primary walk under consideration is the split-merge random walk, but 
for some of the proofs, the original transposition walk is needed. With that in 
mind, make the following two definitions: 

Definition 4. For a d Sn, define Cyc(Q;) to be the partition corresponding to 
the cycle type of a. For a G Vn, let 

Perm(o-) = {a E Sn \ Cyc(Q;) = a} 

be the set of all permutations with cycle type a . 

Definition 5. Let {Xt)t>Q denote the split-merge random walk, and let {Xt)t>Q 
denote the random transposition walk, so that for all t, 

Xt = Cyc(Xt) 

Furthermore, let P and n be the transition matrix and stationary distribution 
for {Xt)t>o, respectively, and define P and t: analogously for {Xt)t>Q. 

The next argument shows it suffices to consider the split-merge random walk. 
The following proof take a little bit of space to write down, but is actually very 
simple - the key idea is that the random transposition walk started at the 
identity is always uniformly distributed over each conjugacy class. (This also 
follows from a more general result - see Chapter 3F of [5].) 
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Lemma 6. Let P, P, n and tt be defined as in Definitions^ above. Then, 
max LP*(a, •) — tt < max P*(it, ■) — tt 

Proof: Since the random transposition walk is a random walk on a group, it's 
vertex transitive. Therefore, for all a G S'„, 

\\P\a,-)-TT\\^^^\\P\id,-)~n\\^^ 

where id is the identity permutation. Thus, it sufBces to show that 



P*{id, •) — ttIL^ < max •) — vr 



I TV 



Now, let (To — Cyc{id) = (1,1,..., 1). It suffices to show that 



P\id,-) -t:\\^^ = ||P*(ao,-) -^1 



TV 



(2.1) 



Since the split-merge random walk is a projection of the random transposi- 
tion walk, for a G Vn, 



aGPcrm(cr) 



|Perm(CT)| 



(2.2) 



since tt is the uniform distribution on S'„. Similarly, 

af^Pcrm((T) 

Furthermore, note that both the identity permutation and the the random trans- 
position walk are symmetric with respect to {1, 2, . . . , n}. Hence for any ai, a2 
with the same cycle structure, P* (id, ai) = P*{id,a2) for all t. Combining this 
with the equation above shows that for a G Perm(cr), 



P*(cro,(T) = |Perm(cr)|P*(id,a) 
Using Equations (|2?2l) and (|2?3| , 

1 



(2.3) 



E 

aGPcrm((T) 



P\id,a) - 



= \P\ao,a)-^{a)\ 



Finally, putting all this together. 



2||P*(id,-)-7f| 



TV 



E 



P\id,a)-- 



E E 

crSP,! aGPGrm(cr 



P\td,a)-- 



which proves Equation (|2.ip . as desired. 



□ 
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Remark 7. Although it is not needed, it is very easy to use the triangle in- 
equality to prove the opposite inequality to the one in Lemma |2. II Hence, the 
two quantities are actually equal. 

Before proceeding to sketch the upcoming proof, it is shown that a Markovian 
coupling for the random transposition walk cannot hope to give an O(nlogn) 
mixing time. 

Lemma 8. A Markovian coupling {Xt,Yt) of the random transposition walk 
takes at least f2(n^) time to meet. 

Proof: It easy to check that wherever the two random transposition walks 
currently are, if Xt 7^ then 



To verify this, note that if Xt and Yt differ only in the transposition {i,j), then 
the only way to meet is to transpose i and j in one of them, and to stay in place 
in the other one; similar arguments hold if Xt and Yt are two transpositions 
apart, and in all other cases, the probability of meeting at the next step is 0. 
Combining the above inequality with the Markov property leads to the desired 
result. □ 

Turn next to an explanation of the idea behind the coupling. The argument 
uses path coupling - that is, coupling a pair of split-merge random walks started 
at a neighboring pair of elements. This technique was invented by Bubley and 
Dyer in [3]; a good reference is Chapter 14 of [13]. To be precise, endow the 
state space with a connected graph structure: that is, select a set of edges E' 
between elements of fl, such that for any u,v G Q, there exists a path between u 
and V only using the edges in E' . It is then only necessary to define a coupling 
for {x,y) e E'. 

Assign lengths l{x, y) > 1 to each edge (x, y) £ E' , and define a path metric 
p on fl by 

p{z, w) = min ■! l{xi,Xi+i) \ xq = z,Xn = w, {xi,Xi+i) G E' for all i > 



Furthermore, define the diameter of the set O in the usual way as diam(r2) = 
max„^„gO p{u, v) The following theorem is the basic path coupling bound. 

Theorem 9. Let {Xt)t>o be a Markov chain on a set fl, and let E' , I and 
p he defined as above. Let {Xi,Yi) he the first step of a coupling started at 
(x, y) G E' . Then, if there is a k < 1 .such that for every {x, y) € E' , 





np{XuYi)]<np{x,y) 



(2.4) 



then for all t > I, 



d{t) < diam(S')K* 
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Returning to the random walk under consideration, define neighboring pairs 
of partitions to be precisely the pairs which are one step away in the split-merge 
random walk. Then, define a coupled process {Xt,Yt) such that Xq = a and 
Yq = T, making sure that the distance between Xt and Yt at each step is at 
most 1. Here are some useful definitions. 

Definition 10. For a and r partitions of n, define p{a, r) to be the distance 
between a and r induced by the split-merge random walk; that is, p{a, r) is the 
number of split-merge steps it takes to get from a to r. 

The next definition is useful for finding a lower bound on the probability of 
coupling at each step given the current location of the two walks. 

Definition 11. Let a and r be partitions of n such that p{(7,t) = 1. Then 
a and r are exactly one merge away, so rearranging parts appropriately and 
without loss of generality letting a be the partition with more parts, 

cr = (ai,a2, . . . ,a™,6,c) 
T = (ai,a2,...,a™,6-|-c) 

where b < c. Then, define 

s(t, (t) = s((T, t) = h and m{T, a) = m{a, t) = c (2-6) 

That is, since a and r differ in the parts b, c and b + c, s(cr, r) is the smallest 
part in which they differ, and m{a,T) is the medium part in which they differ. 
For later use, define m{a, a) — n and s{a, a) = ^. 

In the next section, the coupling is given along with the following lemma: 

Lemma 12. Assume that {Xt,Yt) = (ct, t), for a and r such that p{a,T) = 1. 
Then, p{Xt+i,Yt+i) < 1, and 

nx,,, . > 

That is, the chain stays at most distance 1 apart, and gives the above lower 
bound for the probability of coupling. 

After proving the above lemma, it is shown below that after order n steps, 
s{Xt, Yt) is on average of order n. The lemma then implies that the probability 
of coupling at each step is of order i, which will show that there is a high 
probability of coupling after order n steps. Using the fact that the diameter of 
the set of partitions is no greater than n, Theorem IH] shows that the random 
transposition walk mixes in O(nlogn) time. 

3 The Coupling 

This section defines the coupling for neighboring pairs for the split-merge ran- 
dom walk, and proves Lemma 1121 The coupling is defined in such a way that 
the distance between Xt and Yt at each step is at most 1 for all t. As usual, 
once the two chains meet, they are run together. 



8 



Definition 13. Consider the next step {Xi, Yi) of a coupling which is currently 
at {Xo,Yo) = (cr, t), where p{a,T) — 1 and 

a = {ai,a2, ■ ■ .,am,b,c) 
T = {ai,a2, . . .,am,b + c) 

where b < c. There are a number of cases, considered in the following order: go 
through the possible moves in a, then provide corresponding moves in r. 

• Operations only using the a^: If ai and aj are merged in a for any i 
and j , perform the same operation in t. Similarly, if Ui is split in a into 
{r, tti — r}, do the same for Oi in t. Then, 

Xi = {a[, . . .,a'^,b,c) 
Yi = {a[,...,a'k,b + c) 

for the appropriate {a[,a'2, . . . , a'^,}. 

• Merging 6 or c and flj: If b and are merged in a, merge 6 + c and ai 
in T. If c and are merged in u, also merge b + c and in r. In the 
first case, 

Xi = (ai,...,a^_i,6 + aj,c) 
Yi = (a'i,...,a'„_i,6 + c + ai) 

where {a'-^, a'2, ■ ■ ■ , a'^_-^} = {oi, 02, . . . , am}/{ai}- The case where c and 
ai are merged in a is analogous. 

• Splitting 6 or c: Ifbis split in a into {r, b — r} where r < |, then split 

b + c in T into {r,b + c — r}. Similarly, if c is split in a into {r, c — r} 
where r < ^, then split b + c in t into {r,b + c — r}. The first case results 
in 

Xi = {ai,...,am,r,b-r,c) 
Yi = {ai,...,am,r,b + c-r) 

The second case, where c is split into {r, c — r}, is analogous. 

• Staying in place: // the walk stays in place in a , it is coupled with 
either staying in place in t or with splitting 6 + c in t into {5, c}. Since 
splitting b+c into {b, c} may have already been coupled with splitting c into 
{b,c — b}, let p be the remaining probability of splitting b + c into {b,c}. 
Then, couple staying in place in a with splitting b + c into {b,c} in r with 
probability min (p, i-) . This results in 

Xi = {ai,...,am,b,c) 
Yi = (ai, . . .,am,b,c) 

That is, the chains will couple. 

Couple staying in place in a to staying in place in t the rest of the time - 
that is, with probability ^ — min (p,^)- 
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• Merging b and c: Couple merging b and c in a to any remaining pos- 
sibilities in T. It is easy to check that these are either staying in place or 
splitting b + c into {r,b + c — r}. The first leads to the chains coupling; 
the second leads to 



Xi = (ai, 
Yi = (ai, 



. , am,b + c) 
.,am,r,b + c-r) 



for some r. 



Example 14. As this coupling looks fairly complicated, here are a couple of 
examples. The possible pairs for {Xi^Yi) are listed, as well as the probability 
of each pair. 

1. Let [Xq^Yq) — (cr, r) = ((2,3), (5)). Here, there are no a^, 6 = 2, c = 3, 
and b -\- c — b. A description is provided for each pair of moves: the first 
move corresponds to cr, the second to r. 



f((l,l,3),(l,4)),p 
((l,2,2),(l,4)),p 
((2,3), (2,3)), p = 
((5),(l,4)),p= A 

((5),(2,3)),p=^ 
[((5),(5)),p=^ 



_2_ 

25 
_6_ 

25 



25 



split 2 as {1,1}, split 5 as {1,4} 
split 3 as {1,2}, split 5 as {1,4} 
stay at a. split 5 as {2,3} 
merge 2 and 3, split 5 as {1,4} 
merge 2 and 3, split 5 as {2, 3} 
merge 2 and 3, stay at r 



2. Let (Xq, Fq) = (cr, r) = ((2, 1, 3), (2, 4)), written with the above convention 
that the parts a and r disagree on are written last. Here, ai = 2, & = 1, 
c = 3, and b + c — A, and the first move again corresponds to cr, while the 
second corresponds to r. 



_4_ 

36 
12 
36 



((l,l,l,3),(l,l,4)),p= A 
((3,3),(6)),p 
((l,5),(6)),p 
((2,l,l,2),(2,l,3)),p 
((2,l,3),(2,l,3)),p = 
((2,4),(2,2,2)),p= A 

((2,4),(2,4)),p=^ 
[((2,l,3),(2,4)),p=^ 



_6_ 

36 



_2_ 

36 



split 2 as {1,1} in both 
merge 2 and 1, merge 2 and 4 
merge 2 and 3, merge 2 and 4 
split 3 as {1,2}, 4 as {1,3} 
stay at cr, split 4 as {1,3} 
merge 1 and 3, split 4 as {2, 2} 
merge 1 and 3, stay at r 
stay at a and r 



Going back to the general case, here is a check that the above definition 
provides the correct marginal distribution for Yi . Note that given the way that 
the coupling was defined, it clearly provides the correct distribution for Xi. 

Lemma 15. The coupling in Definition \13\ has the correct marginal distribution 
forYi. 
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Proof: Since a and r share the parts a^, the operations only using the are 
distributed identically in both and hence pose no problem. Furthermore, 

Tn,/T> T , 1 • \ t™/, ^ 1 • \ 26ai 2cai 2(5 + c)ai 
IPfMerge o and ni a) + FfMerge c and ai ni cr) = — ^ — ^ ^ = ^ 

= P(Merge b + c and in r) 

Thus, all the operations involving any Ui work properly. 

Consider next operations that only involve b and c in cr. Splitting b into 
{r, b — r} where r < | in cr is coupled with splitting b + c into {r,b + c — r} in 
r, and similarly for c. It needs to be checked that this is possible - that is, that 
the probability of splitting b + c into {r,b + c — r} in r is sufficiently large to 
accommodate all these moves in cr. 

There are a number of possibilities. First of all, if r < |, then clearly 
r < , and hence according to Definition [21 

PfSplit b + c into {r,b + c-r} in r) = ^^^t = + 

> P(Split b into {r, 6 - r} in a)+ 
P(Split c into {r, c — r} in cr) 

In this case, the probability of splitting b + c into {r,b + c — r} in r is sufficiently 
large. 

X 

Now, if I < r < |, the procedure couples splitting b + c into {r,b + c — r} 
with splitting c into {r, c — r}. Thus, since in this case r is still less than 

PfSplit b + c into {r, + c - r} in t) = ^''^ t '^^ > ^ 

> P(Split c into {r, c — r} in cr) 

which again works. 

Finally, if r > |, splitting 6 + c into {r,b + c — r} is not coupled to splitting 
either 6 or c in a, which obviously does not pose a problem. None of the other 
moves considered in Definition [13] could be an issue, and hence the marginal 
distribution of Yi under this definition is correct. □ 

The next step proves Lemma [T^l This states that the coupled chains stay 
at most one step apart, and that 

Proof of Lemma I12t It should be clear from Definition [T3| that the coupling 
stays at most one step apart for all t. To show that if {Xt,Yt) — (cr, r), where 
p(ct, t) = 1, then 
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let 

0-= (ai,...,a„,6, c) 
r = (ai,...,a„,6 + c) 

where b < c. Then by Definition 1111 s(cr, r) = b. 

From Definition [131 the chains can couple either if a stays in place, or if b 
and c are merged in a. Consider those two cases separately. 



Staying in place in a: The chains will couple if a stays in place and 5 + c is 
split in T into {b, c}. As noted in the definition, these are coupled together with 
probability min {p, i) , where p is the remaining probability of splitting b + c 
into {b, c} in T - the probability that this split hasn't already been coupled to 
something else. To find a lower bound on p, first note that splitting b + c into 
{b, c} in T couldn't have been coupled with any splits of b in a. However, it 
might have been coupled with a split of c in a. Consider two cases: c < 2b and 
c > 2b. 

If c < 2b, then splitting c into {c — &, 6} in a is coupled to splitting b + c into 
{c — b, 2b} in t since c — b < b. This means that nothing is coupled to splitting 
b + c into {b,c}, and therefore 

p = PfSplitting b + c into {b, c} in r) > > ^ (3.1) 

If c > 26, then splitting 6 + c into {6, c} in t is indeed coupled with splitting 
c into {6, c — 6} in a. In this case, clearly b ^ c, and hence 

P(Splitting b + c into {6, c} in t) = ^''^^'^■^ 

Therefore, 

p > P(Splitting b + c into {6, c} in r) - P(Splitting c into {6, c - 6} in (t) 
>?(i^_^ = 4 ,3.2) 

Equations p.ip and (|3.2p give p > Furthermore, note that 6 < c, and 
6 + c < n. and hence 6 < 2.. Therefore, 

/ 1\ /26 1\ 26 , , 

mm I - I > ™in ( — i r ) - ~ (^.3) 

Hence, 

P(Coupling if staying in place in a) = min [ p, — ) > ^ (3.4) 

\ n I 
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Merging b and c in tr: Next, consider the probability of coupling if b and c 
are merged in a. Clearly, this would need to be coupled with staying in place 
in r. The only other thing that staying in place in r could have been coupled 
with so far is staying in place in a. As noted in Definition 1131 

P(Both a and r stay in place) = min ( p, — 

n V " 

for the same p used above. Thus, 
P(Coupling if merging b and c in a) = ¥(b and c merged in a, r stayed) 

= — — P(Both (7 and r stayed) /„ 

1\ . 2b 
n / n"^ 



using Equation p.3p above. 

Finally, combining Equations (|3.4p and (|3.5I 



as required. □ 

Continuing with the proof, as sketched out earlier, the rest of this paper will 
be concerned with showing that s{Xt,Yt) is of order n after 0{n) time. The 
next section shows how that proves Theorem [1] and provides a summary of the 
proof. 

4 Proof of Main Theorem Using E [s{Xt, Yt)] 

As described above, one of the main tools of this paper is the following theorem: 
Theorem 16. There exist constants a and (3 such that for all t > an, 

E[s{Xt,Yt)]>l3n 

This section uses the above result to prove Theorem [T] To start, prove the 
following easy lemma: 

Lemma 17. Let {Xt, Yt) be defined as in Definition \13\ where as usual p{Xo, Yq) 
is equal to 1 . Let a and /3 be the constants in Theorem \16\ above. Then, 

P {Xan+^ — Yan+^) > P 

Proof: Since by Lemma [T^ P(^t — Yt) is non-decreasing, if P(X( ~ Yt) > /3 
for any t < an + the argument is complete. Thus, assume that 

nXt = Yt)<(3 (4.1) 
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for alH < an + § . 
Clearly, 

nXt+i = Yt+i) = FiXt = Yt) + V{Xt+i = Yt+i I Xt ^ Yt)F{Xt ^ Yt 
Rearranging, and using Lemma [T^ 

'HXuYt 



¥{Xt+,=Yt+,)-¥iXt=Yt)>E 



4tE 



siXt.Yt) 



Xt^Yt 
Xt^Yt 



nXt ^ Yt) 
\Xt + Yt) 



A lower bound is now needed for the right-hand side. Assume that t > an, and 
hence that E [s{Xt,Yt)] > (3n by Lemma [TCI Then, 



s{Xt,Yt) 



Xt^Yt 



\Xt^Yt)=E[s{Xt,Yt)] 



E 



s{Xt,Yt) 



Xt = Yt 



\Xt = Yt) 



>l3n^-Y{Xt = Yt) 



since if Xt = Yt, s{Xt, Yt) = ^. Furthermore, using Equation (|4.ip . 



E 



s{Xt,Yt] 



Xt^Yt 



nxt ^Yt)>^ 



Combining this with Equation 



P{Xt+i=Yt+i)-ViXt^Yt)> 



2p 



for all an < t < an + ^. Adding up these inequalities for all t in [an, an 



P {Xan+^ — Yan+^) > P 



as required. 



□ 



For path coupling, a lemma about the diameter of P„ under the split-merge 
random walk is needed. 

Lemma 18. The diameter of Pn under the split-merge random walk is at most 
n. 

Proof: Proceed by induction on n. This statement is clearly true for n = 1. 
Now, assume it's true for all m < n — 1, and show it for n. Let a = (ai, . . . , ak) 
and T = {bi, . . . ,bi) be two partitions of n. Without loss of generality, assume 
that ai >bi. 

If ai — bi, create a path from cr to r by just changing the parts (02, . . . ,ak) 
to (1)2, ■ ■ ■ ,bi). Since {a2, ■ ■ ■ ,ak) is a partition of n — oi, by the inductive 
hypothesis, 

p(a, t) < n — ai < n ~ 1. 
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so this case follows. 

Otherwise, ai > bi. Let cti be a with ai split into (61, ai — 61). Then, cti 
and T match on the part 61, and hence by the argument above, 

<n-l 

Since u is a neighbor of cri, this implies that p{(7,t) < n, completing the proof. 

□ 

Theorem[T]is now proved using path coupling. It shows an O(nlogn) bound 
on the split-merge random walk, and hence on the random transposition walk. 

Proof of Theorem [Tl Let ti — an + 2.. Consider the walk {Xk)k>i, where 
each step consists of making ti steps of the split-merge random walk. Let 
{Xk , Yfc ) be the coupling on this new walk induced by the current coupling 
{Xt, Yt). Now, PropositionfTTI shows that if (Xq, "5^) = (cr, r), where p{a, r) = 1, 
then 



E 



p(li,fi)J ^K[p{Xt,,Yt,)]^nXt,^Yt,) 

< {l-/3)p{<J,T) 



using the fact that p{Xt, Yt) is always either or 1. Therefore, if d{k) is defined 
to be the distance from stationarity of {Xk, Yk), then from Theorem [51 

d{k) < diam(P„) (1 - Z?)'' 

Since neighboring pairs are pairs that are one step apart in the split-merge 
random walk. Proposition 1181 implies that diam(P„) < n. Also using the fact 
that 1 — X < e"^, 

d{k) < ne~-^^ 

Thus, if fc = ^^^^j then d{k) < e^^ < j. But it's clear from the definition of the 
new walk that 

d{kti) = d{k) 

Thus, 

^ + l^jnlogn^=d{kt,)<\ 

which means that the walk has mixed by time + nlogn, completing 
the proof. □ 



5 Proving E[s(Xt,l^)] is large 

Let us now summarize the rest of the proof. The remainder of this paper will be 
devoted to proving Theorem[Tni which states that after 0{n) time, the expected 
value of s{Xt,Yt) is of order n. 
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The proof will be structured as follows: it is shown that in 0{n) time, 
s{Xt,Yt) will have a high probability of being at least order n^/^. Then it is 
shown that it takes another o{n) time for s{Xt,Yt) to have a high probability 
of being of order n. This will clearly suffice to show that that after 0{n) time, 
E[s(Xt,y()] is of order n. Section [5] below will be concerned with growing 
s{Xt,Yt) to order n^/'^, while Section [7] will be concerned with growing it to 
order n. 

Before stating the theorems and sketching their proofs, a number of useful 
definitions are needed. Note that some of these definitions are asymmetrical: 
they are defined in terms of Xt and not Yt. This is an arbitrary choice; since 
the pair {Xt,Yt) is only a step apart, it doesn't make any difference. 

Definition 19. For v G {l,2,...,n}, define Ct(v) to be the cycle of Xt con- 
taining V. Furthermore, for a number x, define 

Vt{x)^{v&{l,2,...,n]\ \Ct{v)\>x] 

Thus, Vt{x) is the union of all cycles of size at least x. 

Remark 20. Note that if Xt = (ai, 02, ... , a„i), then 

\Vt{x)\ = ^ a, 

Thus, the size of Vt{x) is a function of Xt- 

The first proposition that grows s{Xt, Yt) to order n^/^ is now stated. 

Proposition 21. Let {Xt,Yt) be the usual coupling started at {Xq,Yq) — (cr, r), 
where p{a,T) < 1. Then, for n sufficiently large and t > 9n, 



{siXt,Yt)>n^/^\Vt{n^/')\>^}>l 



2 

Remark 22. Here, the choice of n^/'^ is in some sense arbitrary - any n", where 
a < ^, would have done just as well. 

A few other definitions which are needed for the statement of the theorem 
growing s{Xt,Yt) from order n^^^ to order n. Indeed, a more general theorem 
is proved. Fix constants e and 6: then, if s{Xt,Yt) starts by being of size 2-'+^ 
(where j can be a function of n), after a certain amount of time q, s{Xt^q, ^t+g) 
has a high probability of being least eSn. The following definition introduces 
some notation necessary for stating the theorem; it currently looks completely 
inexplicable, but will be justified in Section [T] 

Definition 23. Assume e and 6 are fixed constants, and j is a number (possibly 
a function ofn). Then, define 

K ^ \\og^ieSn)] (5.1) 
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Furthermore, for r between j and K define 

ar = \25^^2^'^n{\og2n — r)~\ and — Uj (5-2) 

where as usual, [•] stands for the ceiling function. 

The foUowing proposition proves that s(Xt, Yt) grows to order n. 

Proposition 24. Let {Xt,Yt) be the usual coupling started at {Xo,Yo) = {ct,t), 
where p{a,T) < 1. Let j be a number and let 6 G (0, 1] be a constant such that 
|Vb(2^^"'^)| > 6n and s(cr, r) > 2^^^. If K and tk are defined as in Definition 
\Mandee (0,1/32), then 

P{s{Xr^,Yr,) < e6n} < 0{l)S-h\\og{eS)\ (5.3) 

where the constant implied in the 0(1) notation is universal. 

Proof of Theorem 1161 Propositions \n\ and [Ml can be used to prove Theorem 
\W[ let tl > 9n, and condition on {Xt-^,Yt-^) £ Qt^, where 

Qt, - {{Xt,,Yt,) such that .s{Xt,,Yt,)>n'/\\Vt'l (n^/^^) | > |} (5.4) 

Letting 2J+1 = n^/^ and (5 = 5, if iXt,,YtJ e Qt,, then 

siXt^YtJ > 2^+1 and \VtA2'+')\ > Sn 

Since p{Xt-^,Yt^) < 1, Proposition [24] apphes to pairs {Xt^,Yt^) in Qt^. There- 
fore, averaging over {Xt^,YtJ e Qti, 

F{s{Xt,+r^,Yt,+^^) < edn \ (Xt^Yt,) G Qt,} < 0(1)^^6 |log(e(5)| 

for any e G (0, 1/32). Now, pick e such that the right hand side of the above 
inequahty is at most 1/2. Then, 

¥{s{Xt,+r^,Yt,+r^) > eSn \ (Xt^Y,) G Q* J > ^ 

and therefore, for sufficiently large n, 

F{s{Xt,+r^,Yt,+r^) > eSn} > > J 

using Lemma {TH Therefore, 

E[siXt,+r^,Yt,+r^)]>^ (5.5) 
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It now just remains to show that is that ti + tk can be of order n. Since 5 — \ 
and = n^/'^, by Equation (lOj) 



K-l I K-l 

i=j y r=j 

^ O (nlogn • 2-^+1) = Oin^l^logn) 
Since ti > 9n is arbitrary and tk is o(n), Equation (|5.5p implies that 

E[siXt,Y,)]>'-^ 

for all i > lOn, which is precisely what is needed. □ 

Before the next two sections, in which Propositions and [M] are proved, 
some technical results are needed. These are proved in Section |S] below, and are 
instrumental for controlling the probabilities in the next two sections. 

Lemma 25. Let a be in Sn, and let {Xt)t>i be the random transposition walk 
starting at a. Then, the expected number of v such that lCi(w)| < |Co(w)| and 
\Ci{v)\ < X is no greater than 

For the next four lemmas, let {Xt, Yj) be the usual coupling starting at {a, r), 
where p(a, r) = 1, s((t, t) ~ b and m((T, r) = c. 

Lemma 26. If x < c, then 

F{m{Xi,Yi)<x} < 
Lemma 27. If x < c, and if \Vo{y)\ > R, then 

nmiXi,Y^)>x + y}>^^^^^ 
Lemma 28. If x <b, then 

F{s{Xi,Yi) < x} < 

Lemma 29. If x and y satisfy x<b<x + y<c, and |Vo(y)| > R, then 

™r 1 2b(R-3x~3y) 
P{s{Xi,Yi) > X + y} > ^ 5 
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6 Growing to G (n^/^) 



This section proves Proposition [5T] It makes a lot of use of the results of 
Schramm in "Compositions of random transpositions" |19) . A number of defini- 
tions are needed to state his main result. 

Definition 30. // {Xt)t>o is the random transposition walk, define Gt to be the 
graph on {1,2, ... ,n} such that {u, v} is an edge in Gt if and only if the random 
transposition (u, v) has appeared in the first t steps of our walk. Furthermore, 
let Wt denote the set of vertices of the largest component of Gt . 

Note that the behavior of the Wt defined above is well-understood; indeed, by 
an Erdos-Renyi theorem (see for example ^J), ii t = cn, then 



in probability as n oo. where z(s) is the positive solution of 1 — z = e 

Definition 31. The Poisson-Dirichlet (PD{1)) distribution is a probability 
measure on the infinite dimensional simplex fl = {(xi, X2, ■ • . ) | Xii^o ^ -'^l- 
Sample from this simplex as follows: let Ui,U2, . . . be an i.i.d sequence of ran- 
dom variables uniform on [0, 1]. Then, set xi — Ui, and recursively, 



Let (jji) be the (xi) sorted in nonincreasing order; then, the PD{1) distribution 
is defined as the law of {yi). 

The main theorem (Theorem 1.1) of Schramm's paper |19| can now be stated. 
This remarkable result was proved using the tools of graph theory and coupling. 
A clever lemma showing that vertices that start in 'sufficiently large' cycles are 
likely to end up in cycles of order n also played a pivotal role fLemma l35l below 
is an almost exact reproduction of the result.) The full strength of the result 
is not needed: while Schramm determines the law of the large parts of Xt, the 
only fact necessary here is that after a sufficiently long time, these cycles are 
of order n. For this theorem, treat Xt as an infinite vector by adding infinitely 
many Os at the end of it. 

Theorem 32 (Schramm). Let c > 1/2, and take t = cn. As n ^ oo, the law 

of converges weakly to the PD{1) distribution; that is, for every e > 0, if 
n is sufficiently large and t > cn, then there is a coupling of Xt and a PD{1) 
sample Y such that 



\Wt 



-> z{2c) 



(6.1) 



n 





(6.2) 



where \\-\\ 



oo 



is the standard l°° distance. 
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The proof that follows uses Theorem [5^ to show that at thue t = n, more 
than half the vertices are in cycles of order n with high probability. This 
is used to 'grow' m{Xt,Yt) to order n^^^, after which the same is done for 
s{Xt,Yt). The results for m{Xt,Yt) are needed before the results for s{Xt,Yt): 
since s{Xt,Yt) < m{Xt,Yt), m{Xt,Yt) constrains the growth of s{Xt,Yt) from 
above. Good control on m is needed before tackling s. 

Lemma 33. Let k be a natural number not dependent on n. For sufficiently 
large n, that is, for n > N = N(k), 

P{\Vn{n/k)\>n/2}>l-j 

k 

Proof: For convenience of notation, let X = {xi,X2, ■ ■ ■) be X„, let Q = 
{qi,q2,...) be f^^, and let Y = (?/i,y2,---) be a PD{1) sample which is 
coupled with Q to satisfy Theorem [32] above. With current notation, 

\V„{n/k)\ = ^ X, (6.3) 

' — k 

For the rest of the proof, fix e = ^ . First note that Equation (|6.ip implies that 

^ ^ z{2) « 0.797 
n 

in probability, which means that lim„^oo P < 3/4} — 0. Since Q = 

Xn/\Wn\, for sufficiently large n, 

3n 1 

Xi > ~1i for all i > > 1 — e 

Furthermore, Theorem 1321 implies that for sufficiently large n, 

P {qi >Vi-t for alH} > 1 - e 

Combining the above two equations, 

3n 1 
> —ivi - e) for aU i ^ > 1 - 2e (6.4) 

for sufficiently large n. 

Thus, to estimate |y„(r7,/fc)| it suffices to consider the large parts of the 
PD{1) sample Y. To that end, define the random variable 

Gy{x) ^^Vi 

yi>x 

It is easy to check that E [Gy{x)] — 1 ~ x, and therefore E [1 — Gy{x)] = x. 
Thus, Markov's inequality implies that 

P{Gy{x) < 3/4} = P{1 - Gy{x) > 1/4} < Ax 
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Recall that e = -x. Then, combining the above with Equation 



3n 



x.>-(y.-e) foralH,GK(-) >-^>l-- 



13 



6 



(6.5) 



Finally, assume that Xi > ^ {yi ~ e) for each i, and that Gy (^) > f ■ Then, 



Equation ()6.3p implies that 



\Vnin/k)\> 



3n 



3n 
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iyi>13/9fc 



i/i>13/9fe 



9k 



9k 



13 



(6.6) 



using the fact that there can be at most || values of yi that are greater than 
since the yi are positive and sum to 1. Therefore, using Equation (|6.5p . for 
sufficiently large n 



P{|K(n/fc)|>^} 



> 1 



6 
k 



as required. 



□ 



The above lemma is now applied to find a t of order n such that the prob- 
ability of having m{Xt,Yt) > n^^^ is sufficiently high. Lemmas [26l and [27l give 
control of m(Xt, Yt). 

Lemma 34. // n is sufficiently large and t > 5n, then 

F{m{XuY,)>n'/\\v,(n'/')\>^}>^ 
Proof: From Lemma [33l at time t = n, 

P{\Vtin/k)\>^] 



(6.7) 



Average over the possible values of Xt-n to conclude that Equation (|6.7p also 
holds for any time t > n. Now, for convenience of notation, define 

St ^{{Xt,Yt) s.t.\Vt > ^1 (6.8) 

For sufficiently large n, n^l'^ < j for any fixed value of k. Fix e > 0. Then, 

implies that V{St) > I - e. 

(6.9) 



for t > n and sufficiently large n, Equation 
Furthermore, define 



At^[{XuYt)\m{XuYt)>n^'''] 
To find a lower bound for V{At n St) for t > lOn, note that 
F{At n St) > V{At) ~ ^{SD > ¥{At) - e 



(6.10) 
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and hence it suffices to bound F{At). This is done using a recursive argu- 
ment: at each step t, calculate the probability that m{Xt,Yt) was too small, 
but m{Xt+i,Yt+i) is large enough, and vice versa. The probability of At is 
shown to grow sufficiently quickly with t. 

Start by bounding the probability that m(X4+i, Yi+i) < n^/^ if m{Xt,Yt) > 
n}-!^ . By Lemma with x = v}!'^ . 

P{(Xt+i,yt+i) i At+i I {Xt.Yt) ^At}<'^ ^ 



„4/3 

and therefore 

P{(Xi+i,r^+i) i A+i, {XuYt) e A} < -^n^t) (6.11) 

Now bound the probability that m{Xt, Yj) < n^/^, while m{Xt+i,Yt+i) > n^/^. 
In order to bound this in a satisfactory way, enough parts of size n^/^are needed; 
accordingly, work with {Xt,Yt) G A^fHSt. lfm{Xt,Yt) < n^l^ and \Vt (n^/^) | > 
2-, then using Lemma [57] with x — ^,y — n^/-^, and i? = 

2(n/2-2ni/3) i_g 

p{(Xi+i,r4+i) e A+i e n > ^ ' , ^ > 

for sufficiently large n. Thus, for t > n, using the fact that F{St) > 1 — e. 



P{{Xt+i,Yt+,) e At+i, {Xt,Yt) ^At}>[^] P{At n 5t) 



n 



(6.12) 



for sufficiently large n. Combining Equations (|6.1ip and (|6.12p . 

P(A,+i) - P{At) > -^P(At) + (^i^) (1 - nAt) - e) 

^ l-P{At)-'Se 
~ n 

for sufficiently large n and t > n. Rearranging the above, 

(1 - 3e - F{At+i)) < (^1 - (1 - 3e - P{At)) (6.13) 
and hence using recursion and the lower bound in Equation (|6.10p . 

(1 - 3e - P(At)) < (^1 - < e-(*-")/" 

n S-f) > 1 - 4e - e-(*-")/" 



Thus, for i > 5n, P(Af n S"*) > 1 - 4e - e""* « 1 - 4e - 0.018, and picking e 
appropriately completes the proof. □ 
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Proposition [21] (Restatement). For sufficiently large n, and t > 9n, 

¥{siX,,Y,)>n'/\\v,{n'/')\>^}>\ 
Proof: This proof is very similar to the one above. Let t > 5n, and define 

From the above lemma, F{Rt) > |- Now, define 



Ct = {{Xt,Yt)\s{Xt,Yt)>n'/''} 



It is shown below that P(Cf D Rt) > ^, which will clearly suffice. Note that for 
t > 5n, 

nCtnRt)>F{Ct)-^ (6.f4) 

and hence it suffices to find a lower bound on P(Ct). As above, this is done by 
finding recursive bounds on the probability of Ct+i given the probability of Ct- 
By Lemma [25] with x — n^/^, 

4a;^ 4 

P{(Xt+i,ri+i) ^ Ct+i I iXt,Yt) eCt}< — 



n' 



4/3 



and therefore 



p{(Xt+i,yt+i) i Ct+i, {Xt,Yt) e Ct} < ^nct) 



(6.f5) 



Now, assume that iXt,Yt) e n Rt. Then m{Xt,Yt) > n^/^ > s{Xt,Yt) 
and Vt (n^/^) > ^. Therefore, using Lemma [^ with x — 0,y — n^^^, and 



^ - 2' 



2(n/2-3ni/3) ^ i 



,5/3 



Thus, for t > 5n, using the fact that P(i?t) > |, 



> 



1 6 



rt rt' 



5/3 



nCt n 



P(Ct) 



(6.16) 



for sufficiently large n. Therefore, combining Equations (|6.15p and (|6.16p and 
picking n sufficiently large. 



P(Ct+i)-P(Ct) > 



> 



„4/3 

3/4 - F{Ct) 



)+ r 



1 



6 



n n 



5/3 



(6.17) 
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for t > 5n. Rearranging analogously to Equation (|6.13p . 

(|-p(c.«))<(i4)(2-nc.)) 

As before, for t > 9n, P(Cf) > j^. Combining this with Equation (|6.14p . 

nCtDRt) > i 

for t > 9n and n sufficiently large, as required. □ 

7 Growing to G(n) 

This section proves Proposition which shows that s{Xt,Yt) can be grown 
to order n. This section is structured similarly to the previous one: proving 
a lemma about overall cycle sizes, then a lemma about m{Xt,Yt), and then 
finally Proposition 1241 Again, use is made of the technical results in Lemmas 

Hi] through [m 

The idea behind the proof is largely based on Lemma 2.3 from "Compositions 
of random transpositions" [H]. Let e, 5 and j be chosen as in Proposition [Ml 
Recall that Definition [23] defines K — [logj (e(5n)] and 

r-l 

Qr = \26^^2^^ n(log2 n — ?')] and = Oj 

for r between j and K, with tj = 0. Then, define 

/, = [t„t,+i - 1] (7.1) 

and for convenience of notation, define Ik — {tk}- 

As should be clear from the statement of Proposition [Ml the argument 
starts with s(7r, tr) > 2-'"'"^ and Vb(2-'+^) > Sn, and shows that at time tk, the 
probability that s^Xr,^ ,Ytj^) is less than eSn is appropriately bounded above. 
In fact, something stronger is shown: for the intervals Ir as defined above, one 
'expects' to have 

Vt{2-+^) > Y^^i^iYt) > 2^ and m{XuYt) > 2^+' 

for all r between j and K. This would clearly suffice to prove the result. 

The first lemma is almost identical to Lemma 2.3 from [,19j - it is reproven 
here for completeness, and to illustrate the technique. This lemma starts with 
a e Sn, and | Vo(2-'"''^) | > 6n. It gives an upper bound for the expected number 
of vertices that start in cycles of size at least 2-'+^, and that are not in cycles of 
size eSn at time tk- This shows that 'most' vertices that start in cycles of size 
2''+^ are in cycles of order n at time tk- 
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Lemma 35. Let a G Sn- Let 5 S (0,1) be a constant such that |Vo(2-'+-'^)| > 
dn, and let K and tk be defined as they are above and in Definition \23l Fix 
e £ (0,1/32). For the random transposition walk (^Xt)^^f^, 

E\Vo{V+^)\Vr^{2eSn)\ < 0{l)6-'^e\\og{eS)\n (7.2) 
where the constant implied in the 0(1) notation is universal. 

Proof: Before beginning the proof, consider what is being shown. Starting 
with a a such that |Vb(2^+^)| > 5n means that at least Sn of the vertices in 
a are in cycles of size at least 2^+^. An upper bound on the expected size of 
Vo{2^~^^)\VTif {2e6n) is needed: that is, an upper bound on the expected number 
of vertices that started off in cycles of size at least 2^+^ in a, and ended up in 
cycles of size less than 2eSn at time tk ■ 

Something stronger is shown: conditioned on u G Vq{2^^^), 

E\{v s.t. Ctiv) < 2''+^ for any t e Ir, for r e [j,K]}\ < 

0{l)S-h\\og{eS)\n ^'^'^^ 

This requires an upper bound on the expected number of vertices that for any 
time t G Ir are 'too small' for Ir'. they are of size less than 2^+^. Note that the 
above set includes all vertices such that Ct-^(w) < 2eSn < 2^+^, and hence the 
above bound would suffice. 

Three different possibilities are considered. First of all, an upper bound is 
needed on the expected number of vertices v such that at any point, the cycle 
containing v is split, and becomes too small. Secondly, all vertices that appear 
in permutations with an insufficient number of large parts are rejected. And 
thirdly, it is necessary to bound the possibility that the cycle containing v does 
not grow sufficiently during Ir. Call the vertices that fall into any of these 
undesirable categories 'failed.' 

In the next three sections, condition on v G Vb(2-'+^): that is. assume that 
V is in a cycle of size 2^+^ in a. This means that v has not failed at time 0. 



The cycle containing v becomes too small Let r G [j,K — 1], and let 
t <E Ir + 1 = [Tr + l,Tr+i]. For Ct{v) to be of size 2''+^ by time r^+i, calculate 
the probability that for any t G + 1, the cycle containing v is split, and v is 
then contained in a cycle of size less than 2''"'"^. To be precise, define Ft to be the 
set of vertices at time t such that |Cf(w)| < |Ct_i(w)| and |Ci(u)| < 2''+^. Find 
the expected size of Ft: by definition, this is the expected number of vertices u, 
whose cycle is split from time t — 1 to time t, and which are in cycles of size less 
than 2''+^ at time t. By Lemma 

9/'9r+2^2 r,2r+5 

E\Ft\<'-^^^^- 

n n 

Now, define the cumulative set Ft — UL=i ^x- This is the set of all vertices 
up to time t, whose cycles have at any time x < t been split into ones that are 
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'too small.' Clearly, 



E 



22r+5 



K-1 



r=3 



< r2(5-i2-''n(log2 n - r)] 



22r+5 



r=3 

K-1 



< (25-i2-'^n(log2 n - r) + 1) 



22r+5 



(7.4) 



K-1 



K-1 



< ^26<5-i2'-(log2n-r)+^ 



22r+5 



Now, 



K-1 



Y rl'' = {K- 2)2^ - (j - 2)2J 

K-1 

^ 2'' = 2^ - 2-'' 



shows that 



E Fr^ <28|log2(e(5)|en 



(7.5) 



Permutations with insufficiently many large parts It is also necessary 
to rule out vertices in permutations for which the union of the 'large parts' 
isn't sufficiently high. This will be useful for the next part of the proof. To 
be more precise, let t € Ir- if |l4(2''~''^)| < Sn/2, and this is the first t for 
which the inequality holds, then consider all vertices in Xt to have failed, and 
set Ht = {1, . . . , n}. Otherwise, set Ht = 0. 

Again, define the cumulative set Ht = [JI-^qH^- This is the union of all 
vertices that up to time t have been in a permutation with insufBciently many 
large parts, by the above definition. It is clear that this set is either empty, 
or contains all the vertices. There is no current available upper bound on the 
expectation for Ht; one will be derived after the next section of the proof. 



The cycle containing v doesn't grow sufficiently Next, consider how a 

vertex v might fail at time t, if it does not fall into Ft or Ht-i- Assume t 
is the minimal time for which v fails: since failed vertices include all vertices 
contained in cycles that are 'too small', if s < t and s G /fe then |Cs(?;)| > 2*^+^. 
Now, assume that t, the first time at which v fails, is in 7^: thus, t — 1 is 
either in Ir-i or in Ir- Either way, since it was assumed that v is not in F^ , 
it can't be that \Ctiv)\ < |Ct_i(L')| and \Ctiv)\ < 2*^+1. Since the vertex v 
fails at time t, Ctiv) must contain fewer than 2''+^ vertices. Combine this with 
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the preceding statement to conclude that Ct-i{v) also contains fewer than 2''+'^ 
vertices. However, by definition the vertex v did not fail at time t ~ 1. This 
implies t — I must have been in Ir-i- Thus, the only remaining times at which 
vertices could fail are t = r,., for r e {j, j + 1, . . . ,K}. Having conditioned on 
V £ Vq{2^~^^), it may be concluded that v can't fail at time Tj = 0. 

Now, define to be the set of vertices at time that are not in F'^'-Ui/'^''^^, 
such that |C,-^(w)| < 2*"+^, and that have not failed previously. As before, define 
Br = [Sx=j and estimate the expected size of Br- 

Condition on v ^ Fr^ U Ht^-i and calculate the probability that v fails at 
r,-, given that it has not failed up to that time. First, for t G /r-i, |Ct(w)| > 2''. 
Furthermore, since v ^ Fr^, there was no time between t^-i and at which 
the cycle containing v was split to contain fewer than 2''"^^ vertices. This means 
that if V failed at time r^, then Ct{v) must have been of size less than 2''+^ for 
all t € [Tr-i,Tr — 1]. Therefore, for t G Ir-i, 

y < \Ct{v)\ < 2''+i (7.6) 

Furthermore, since v is not in Ht for any t G Ir-i, for every t G /r-i, 
|Vt(2'")| > Sn/2. Consider the probability that from time t to time t + 1, the 
cycle containing v is merged with a cycle of size at least 2^. By (|7.6p above, 
the size of Ct{v) is at least 2*^, so such a merge would result in Ct+i{v) > 2'^'^^. 
Using the above reasoning implies that |Ct-^(v)| > 2'"+^, and therefore v does 
not fail at time t^. Now, again by (|7.6p . the cycle containing v is of size at 
most 2''"'"^. Since |Vt(2'')| > 6n/2, this means the union of the cycles disjoint 
from Ct{v) of size at least 2*" contains at least 5n/2 — 2^~^^ vertices. Now, since 
r <K ^ \\0g2ieSn)], 2''+i < 2^+1 < 4e(5n, and since e < 1/32, 

^ _ 2''+i > — 
2 - 4 

Thus, the union of the cycles of size at least 2'' disjoint from Ct{v) is of size at 
least (5n/4, and therefore 

P{Ct(w) merges with a cycle of size > 2"^} > 2 • 2''^^ = 2''~'^Sn-'^ 

Clearly, for v to be in B,., it cannot be that Ct(y) merges with a cycle of size 
> 2^ for any t G /r-i- Therefore, 

P{wGB,.}< (l-2'-i(5n-i)"''-' (7.7) 
< exp(— 2''^^(5n^^ar-i) 
and since a^-i > 2(5^^2^''+-'^n(log2 n — r + 1), 

P{w G Br} < e2('-i-i°S2 ") 
Now, r~l<K — 1< log2(e(5n), and therefore, log2 n — ?■ + 1 < 0. Thus, 

F{v G Br} < e2(i°g2"-'-+i) < 2iog2"-'-+i = ^ 

n 
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This yields 



and therefore, 



E\Br\ < nP{v e Br} = 2 



E 



K 



K 



Bk\ <^E\Br\ < Xl^''"^ 

r=j r=j 

< 2^ < 6(571 + 1 



(7.8) 



Finally, bound the expected size of Ht, the set of vertices in permutations 
with insufficiently many large parts. Recall that for t S Ir, if | Vt(2''"'"^)| < Sn/2 
and t was the first time this inequality held, Ht was defined to be the set of all 
vertices, and it was otherwise defined to be the empty set. If Ht is non-empty, 
then the set of vertices in Xt that are in cycles of size less than 2^~^^ has size 
at least n — 6n/2 > 6n/2. Now, consider v in Ht such that |Ct(w)| < 2*"+^. By 
definition, v has failed by time t, and v is not in H^ for any s < t. Therefore, 
each such vertex is in FtU Br- Thus, 



E 



H^ 



< nl 



Fr^ U Bk 



> 



5n/2^ 



< 2S-^E 



Fr^ U Bk 



so using (|7.5p and (|7.8p above, 



E 



H^ 



< 25-^{eSn+l + 2^\log{e6)\en) 

< {2^ + l)e6-'\log,ie6)\n 



(7.9) 



as desired. Finally, adding up the expectations for Hrj^ , Bk and Ft-^^ in 
(|7.8p and (|7.5p completes the proof. 



□ 



The next lemma is similar. It shows that m{Xt,Yt) becomes sufficiently 
large at time tk- The proof is almost entirely analogous; the only substantial 
difference is in the bound for the probability of Xt having insufficiently many 
'large parts.' For this bound, Equation (|7.9p above has to be used. Lemmas 
and B71 will also be used. 

Lemma 36. Assume p{(7, t) ~ 1. Let j be a natural number such that m{a, r) > 
2^~^^ , and let 6 G (0, 1] be a constant such that | Vb(2-'+"'^)| > Sn and m(7r, ct) > 
2^^^. Let K and tk be defined as above, and let e G (0, 1/16). Then, 



P{m{Xr^.,Yr^ 



< 2e6n} < 0{l)S-h\\og{e6)\ 



(7.10) 



where the constant implied in the 0(1) notation is universal. 



Proof: This proof is almost exactly analogous to the previous one, except 
that instead of keeping track of failed vertices, failed pairs of partitions will 
be tracked. Something stronger is shown: 



"{miXt, Yt) < 2''+i for any t G Lr, for j G [j, K]} < 0{l)5~'^e \\og{e5)\ (7.11) 
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Again, the argument requires upper bounds on three different cases: the one 
where m{Xt,Yt) shrinks to become too smah at time t, the one where Xt doesn't 
have sufficiently many large parts, and the one where m{Xt,Yt) fails to grow 
sufficiently during 7^. The only major difference in the proof is use of the bound 
from Lemma to bound the probability of Xt having insufficiently many large 
parts. 

Since the quantities specified are precisely analogous, use the names J-t,Bt 
and Hf 



Probability m{Xt,Yt) gets too small during For t E I,. + 1, define 

Ft to be the set of pairs [Xt^Yt] such that m{Xt,Yt) < m{Xt-i, Xt-i) and 
m{Xt,Yt) < 2''+2. Apply Lemma above. Let x = min(2'^+2, rn(Xf_i, 
Then, x < m{Xt^i,Yt^i), and therefore from Lemma the probability that 
m{Xt,Yt) is less than x is bounded above by By definition of Tt, this 

means that 

2x2 (2'-+2)' 22'-+5 



n 



Define the cumulative set J-t — IJ^^j^ J'x- Therefore, 

X — l 

and doing a calculation almost identical to fj7.5ll . 

P{-^,^}<28e|log2(e<5)| (7.12) 

Note that the only difference in the calculation was an extra factor of n in the 
denominator. 



Probability Xt doesn't have enough large parts Define Ht almost ex- 
actly as Ht in the last lemma, except that instead of making it a set of vertices, 
let it be a set of pairs {Xt,Yt). {Xt,Yt) is included in Ht precisely when Xt 
doesn't have enough large parts: that is, ii t G Ir, then {Xt,Yt) is in Ht if 
|^j-2r+i-j| ^ Sn/2, and t is the first time for which this inequality holds. Define 
Ht as usual to be the cumulative set. 

Clearly, if {Xt,Yt) G Ht, then Ht contains n vertices, and otherwise Ht is 
empty. Since |Vb(2^+^)| > Sri, the results derived in Lemma 1551 can be used. 
Therefore, 

F{Ht} = -E\Ht\ 
n 

and thus, from (|7.9p above, 

P{Ht} < (2^ + 1)6(5-1 |iog2(e^)| (7.13) 
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Probability m{Xt,Yt) doesn't grow sufficiently during Ir As before, 
the only remaining times that {Xt,Yt) can fail is at times r^. Accordingly, 
define Br to be those pairs {Xr^,Yr^) that are not in or "Hr^-i, such that 
m{XT-^,Yr^) < 2''+^ and that have not failed previously. As before, if {Xr^, Yr^) 
is in Br, then it had not failed in J^-i, and therefore, for t S /r-i, TO(A^t,i^t) > 
2^. Furthermore, since {Xr^ , Y^^) is not in J>^, it must be that m{Xt, Yt) is less 
than 2''+i for t e /^-i- Thus, for t e 

2''<m(Xt,yi) <2''+' (7.14) 

Furthermore, since Br is disjoint from Hr^-i, for every i G /r-i; |^i(2^)| > 5n/2. 
Since m(A:t,rt) > 2'', Lemma [17] holds with R = (5n/2 and a; = y 2\ Let 
c = m(A'i,yf). Thus, for any t G Ir, 

and since e < -j^, and r < K < log2(e(5n) + 1, (5n/2 - 2*"+^ > Thus, 

nHXt+i,Yt+i) > 2'^+!} > T-'Sn-^ 

Finally, the probability of Br is the probability that m{Xt+i,Yt+i) isn't at least 
2''+^ for any t <E Ir, and therefore, 

V{Br} < {l-2'-^Sn-^y'-' 

and since this is precisely the same inequality as in (j7.7p . 

2r-l 



F{Br} < 
and hence 

P{-Bif}<^P{B.}<^^<e<5 + - (7.15) 

r=j r=j 

Thus, adding (Tm . ([71^ . and (fTTC)) . 

P{m{Xt, Yt) < 2"^+^ for any t £ for r e [j, K]} < 

(29 + 28 + l)erMlog2(e5)| ^^'^^^ 

which is what is needed. □ 

The stage is almost set to prove an analogous result for s{Xt, Yt). As above, 
the two technical Lemma [55] and [5^1 are used. As in the previous section, 
m{Xt,Yt) must be 'sufficiently large' to allow s{Xt,Yf) to grow. This is the 
reason for proving the lemma concerning m{Xt,Yt) first. 
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Proposition [24] (Restatement). Let {Xt,Yt) be the usual coupling started at 
{Xq,Yq) — (cr, r), where p{a,T) < 1. Let j be a number and let 6 G (0, 1] be a 



constant such that 
as in DefinitionlW: 



Vo(2-'+^)| > Sn and s{(7,t) > 2^+^. If K and tk are defined 
and e G (0,1/32), then 



ns{Xr^,Yr,) < eSn} < 0{l)S-h\\og{eS)\ 
where the constant implied in the 0(1) notation is universal. 



Proof of Lemma \24\ This proof is analogous to the proof of Lemma [35] and 
1361 except that the previous two lemmas are used to bound the probability that 
s{Xt,Yt) shrinks or grows. As before, a stronger statement is proved: 

¥{s{Xt,Yt) < 2^ for any t e Ir, for r £ [j,K]} < 0(1)6-^ \\og{eS)\ (7.17) 

Again, bounds are needed for a number of different cases: for the probability 
that s{Xt,Yt) shrinks to become too small during 7^, the probability that Xt 
doesn't have enough large parts, and that the probability that s{Xt,Yt) doesn't 
grow sufficiently on I^. Furthermore, note that Lemma [29] requires the assump- 
tion that m{a,T) > 2x to lower bound on the probability that s{Xi,Yi) > 2x. 
Since s{Xt,Yt) must grow during to be at least 2*"+^ by r^+i, m{Xt,Yt) 
must be at least 2*"+^ on 7^. Lemma [5^ is used to bound the probability that 
m{Xt, Yt) is too small. 

The quantities are precisely analogous to the ones in the two similar previous 
lemmas. Accordingly, name them ^t, J^, and ^t, using the same letters but 
yet another font. The new quantity is added, as discussed above. 

Probability s{Xt, Yt) gets too small during L^ For t G Ir+1 = [t, +1, t^+i], 
define to be the set of pairs {Xt, Yt) such that s{Xt, Yt) < s{Xt-i,Yt-i) and 
s{Xt, Yt) < T+^. Apply Lemma[lH]above. Define x = min(2''+i, s(Xf_i, Ft.i)). 
Then, x < s{Xt-i,Yt-i), and therefore Lemma [28l applies. Plugging it in, the 
probability that s{Xt, Yt) is less than x is at most Thus, 

4x2 4(2'-+i)2 22'-+4 
rt^ n"^ 

Now, define the cumulative set J^t = Ul=i ■^x- Then, 

TK K-l 2'2r+4 



a: — 1 r—j 

Doing a calculation identical to the one in f|7.5p and f|7.12p . 

n^r^}<'2'e\\og,{e6)\ (7.18) 
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Probability Xt doesn't have enough large parts For t G /,-, define Ml 
very similarly to before, to be the set of {Xt,Yt) such that |Vt(2'')| < 5n/2, 
whenever this is the first t for which this inequality holds. Define Mt to be the 
usual cumulative set. Now, from Lemma 1551 

nt^{{Xt,Yt) \ \Vt{2^+^)\ < 5n/2} 

Since ^((2' ) D Vt{2''+'^), clearly ■^t^'Ht, and therefore, using ({71^ 

P{^} < V{Ht} < (2^ + l)eS-^ |log2(e<5)| (7.19) 



Probability m{Xt,Yt) is too small For t ^ Ir, define to be the set of all 

{Xt,Yt) such that m{Xt,Yt) < 2^+^. As usual, define to be the cumulative 
set. Since at the start s(ct, r) > 2-'+^, m{a,T) > 2-'+^ is forced. By assumption, 
Vo{2^~^^) > Sn, so any inequalities derived in Lemma [551 are in force. Thus, from 
Equation ((77T5)) . 

P{m{Xt, Yt) < 2''+i for any t G for r e [j, K]} < (2^ + 2^ + l)e6-^ |log2(e5) 
and clearly, from the definition of 

P{^*} < (2^ + 2^ + l)eS-^ |log2(e(5)| (7.20) 



Probability s{Xt,Yt) doesn't grow sufficiently during As before, the 
only remaining times that s{Xt,Yt) can fail is at time t^. Therefore, define 
to be the set of {Xr^,Yr^) that are not in =^r^, J^t^-i or ^r,., such that 
s(XT-^,yT-,J < 2^ and that have not failed previously. If (Xt-^,1V,,) is in £§r, 
then it had not failed in Ir-i, and therefore for t e Ir-i, s{Xt,Yt) > 2^~^. 
Furthermore, since {Xt-^,Yt-^) is not in =^t,., for t G /r-i, s{Xt,Yt) < 2''. Thus, 

for t G /r-l, 

2''"' <s(Xt,yO <2'' (7.21) 
Furthermore, since (X^^,X^ ) is not in ^r,., for t G Ir~i 

m{Xt,Yt)>2- 

Finally, since is disjoint from for every t G /r~i, |Vt(2''^^)| > Sn/2. 

Now apply Lemma with R = 6n/2 and x = y = 2^^^. For any t E Ir, 

m (Y V ^ ^Q.i ^ 2x(j?-3a;-3y) 2-^(^/2 ~ 3 ■ 20 
P{s(Af+i, Ft+i) > 2 I > 2 = 2 

Since r < if = [log2(e(5n)l , and since e < 3 . 2'" < 6e(5n < Thus, 

P{s(Xt+i,y(+i) >2''} >2'^-2^n-i 

Finally, the probability of 3§r is the probability that s(Xf+i, Yt+i) isn't at least 
2*"+^ for any t G /r, and therefore, 

P{^4 < (1 - 2'-'^5n-^Y'''' < cxp{^2'^'^Sn-^ar-i) 
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Now, since a^-i = \25-^2-''+'^n{log2 n-r + 1)], 

fWr} < e'-i-'°S2" < 2'-i-iog2" = 

n 

using the fact that r < K < log2 n + 1, and hence r — 1 — log2 n < 0. Therefore, 
P{ < V P{^J < V ^ < — <e5+i (7.22) 

r=j r=j 

Now, adding (fTTBl) . (ITT^ . (fT^ and ([7221), 

P{s(Xt,yt) < 2'' for any t £ for r £ [j,K]} < \\og{eS)\ (7.23) 

as required. □ 

Remark 37. Assiduously tracking down all the constants in the above argu- 
ment shows that the mixing time was bounded above by 2^^nlogn or so. This, 
of course, is very far from the correct answer of ^nlogn. While this argument 
can almost certainly be mildly tweaked to give a less intimidating answer such as 
lOnlogn, it is unlikely that it could be manipulated to give the right constant. 



8 Technical Lemmas 

In this section, the technical results in Lemmas through are proved. For 
the convenience of the reader, the results are restated. 



Lemma 1251 (Restatement). Let a be in Sn, and let {Xt)t>i be the random 
transposition walk starting at a. Then, the expected number of v such that 
\Ci{v)\ < |Co(w)| and |Ci(w)| < x is no greater than 

Proof: Let a = (ai,...,am)- Clearly, the only way that |Ci(t>)| < |Co(w)| is 
if the cycle containing v is split; furthermore, the only way that |Ci(u)| < x 
is if V winds up in a piece of size less than x. The 'ordered' splitting formula 
shows that the probability of splitting ai into (r, Oi — r) is ^ . Consider the 
cases where either r < x or ai — r < x. Thus, summing over the possible Oi, 



E s.t. \Ci{v)\ < \Co{v)\ , \Ci{v)\ < x}\ < 



It's clear that 




Qi-l N 

Oi ^ ^ . , ai 



i — l \r — l r— Qi— x+1 



ai—1 x—1 x—1 

\ " , ^ tti \ ^ flj Oi \ ^ 

(^^~r)—^}r- — = — }r< 



T—ai—x-\-\ r—1 c—1 
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Therefore, 

E\{vs.t. \C,{v)\ < |Co(t;)MCi(«)| <a:}| <^^ = ^^a, 

i=l i=l 

9 9 
X X 

v? n 

as required. □ 

For the next four lemmas, let {Xt^Yt) be our usual coupling starting at 
(cr, r), where p{a,T) = 1, s{a,T) — b and m{a,T) = c. For these proofs, it will 
be useful to reference the original definition of the coupling and the possible 
pairs (Xi, Yi) in Definition [T51 

Lemma 1261 (Restatement). If x < c, then 

¥{m{Xi,Yi) <x} < 

Proof: Let us assume without loss of generality that 

CT = (ai, . . . ,a„,6,c) (8.1) 
T = (ai,...,a„,6 + c) 

Consider how m(Xi,Yi) could be smaller than c. Note that performing an 
operation involving only the ai on a and r, then Xi and Yi will still differ in 
b,c and 6 + c, so m(Xi, Yi) — c. Furthermore, merging ai with b in a and ai 
with b + c ill r, then Xi and Yi will differ in the parts (& + a^, c, 6 + c + a^), 
which are greater, respectively, than (b, c, 6+c). This means that m{Xi, Yi) > c. 
Similar reasoning holds for merging ai with c in cr, and hence these cases do not 
contribute to P{m(Xi, Yi) < x}. 

Also, note that if b is split into {r, 6 — r} for r < |, then 

Xi = iai,...,am,r,b-r,c) 
Yi = {ai,...,am,r,b + c-r) 

Clearly, c > b > b — r, and therefore m{Xi, Yi) — c. Thus, m cannot decrease if 
b is split in a. This gives cases: splitting c in cr, and merging b and c in cr. The 
cases in which the coupling meets can be ignored, since rn(a, a) = n > c, and 
hence these cases do not contribute to P{m(Xi, Yi) < x}. 

Splitting c in cr: If c is split into {r, c — r} for r < |, then 

Xi = {ai,...,am,r,b,c-r) 
Yi = {ai,...,am,r,b + c-r) 
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Clearly, m{Xi,Yi) > c ~ r. Thus, to have ■m{Xi,Yi) < x, it must be that 
c — r < X, and thus r > c ~ x. By definition, r < |, and hence 

c 

c — X < r < - 

If 2x < c, this set contains no elements, so assume for now that 2x > c. Then 
the number of possible r is at most | — (c — .x) = . Since the probability of 
splitting c into {r, c — r} is at most ^ for each r, 

P(m(Xi,Yi) <x, c split in a) < ^^'^^ ~ < :L (8.2) 

using the AM-GM inequality and the assumption that 2x — c > 0. Furthermore, 
the above inequality also holds when 2x < c, since in that case, the left-hand 
side is 0. 

Merging b and c in a: If b and c are merged in tr, 

Xi = {ai,...,am,b + c) 
Yi = (ai,...,a,„,s, 6 + c-s) 

for some s < ^i^. Hence, m{Xi,Yi) = b + c~r. Again, to have b + c — r < x, it 
must be that s > b + c — x, and the probability of each split is at most ^^^r^. 
Thus, analogously to above, consider 

^ b + c 

b + c — x<r< — - — 



2x-{b+c) 

otherwise. Therefore, ii 2x > b + c, 



and hence the total number of such s is at most — — ^ — — if 2a; > 6 + c, and 



(b + c)(2x ~ (b + c)) x^ 
P{m{Xi,Yi) <x, b and c merged in tt} < -i ^ < — (8.3) 



again using AM-GM. This clearly also holds for 2x < b + c. 
Finally, adding and 



2x'^ 

P{m{Xi,Yi) < x} < — 

as required. □ 
Lemma 1271 (Restatement). If x < c, and |Vb(y)| > R, then 

P{HXi,Y,)>x + y}>^^^^^^ 
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Proof: Consider both the possibilities that 



TT = {ai, . . . ,am,b,c) (8.4) 
a = (ai, . . . ,ajn, b + c) 



and that 



7r= (ai,...,a,„,6 + c) (8.5) 
a = (ai, . . . ,a,n, h, c) 

with b < c, since Vt{y) is defined for Xt and not Yj, and therefore the symmetry 
breaks down. Merging c with an ai > y will result in to(Xi, Yi) = c+a,; > x + y. 
To calculate the probability of such a merge, the sum of these is needed. 
In both cases (|8.4p and (|8.5I) . since a and r agree on the a^, 

$^a»>|Vb(y)|-(& + c)>i?-2c (8.6) 

ai>y 

using Remark 1201 For case (|8.4p . merging c and some ai > y in a gives 

= (ai,...,a^_;^,6, c + Oi) 
Yi = (a'l, . . . , a^_i, & + c + fli) 

where {a'j^, . . . ,a'„_i} = {oi, . . . ,a„i}/{ai}. Clearly, c + a,; > &, and therefore 
m{Xi, Yi) — c + ai > X + y. The probability of merging c with in a is 
and thus 

/,.^ 1 2ca, 2cv-^ 2c(i? - 2c) 

using Equation (|8.6p for the last inequality. Thus, in case (|8.4p the proof is 
finished. Furthermore, since Equation (|8.6p is symmetric for the cases (|8.4p and 
l|8.5p . the second case is completely analogous. □ 



Lemma 1281 (Restatement). If x <b, then 

4a;2 

ns{X^.Y,)<x] < — 

Proof: For simplicity, assume without loss of generality that tt and a satisfy 
(18. ip above. In the same way as in Lemma B^ above. any operations involving ai 
cannot make s{Xi, Yi) smaller than b. Thus, the operations that might produce 
s{Xi,Yi) < X involve either splitting 6 in cr, splitting c in cr, or merging b and 
c in cr. Consider these cases separately. In the same way as before, the cases 
where the coupling meets can be ignored. 
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Splitting b in a: Recall that if b is split into {r, b — r} for r < |, then 

Xi = {ai,...,am,r,b-r,c) 
Yi = {ai,...,am,r,b + c-r) 

Thus, s{Xi,Yi) = niin(& — r,c) = b — r. To have s{Xi,Yi) < x, b — r < x is 
needed. Hence, consider r such that 

b — X < r < - 

- 2 

If 2x < b, this set contains no elements, so assume 2x > b. Clearly, the above 
set is of size at most x — ^. The probability of splitting b into (s, b — s) is at 
most for each s < k, and therefore 

P{s(Xi, Fi) < X, b split ina}<^(x-^) = fel^ < ^ (8.7) 

using AM-GM and the assumption that 2x > b for the last inequality. This 
clearly also holds if 2x < b, since in that case the left-hand side is 0. 

Splitting c in a: This calculation is very similar to the above. The probability 
that c is split into {r, c—r}, where r < | and c—r < a; is needed. Again, consider 

^ c 

c — X < r < - 

- 2 

and since the probability of a particular split is at most assuming that 
2x > c, the total probability of all these cases is at most 

F{s(Xi, Yi) <x, c split in a} < ^^'^^ ~ < ^ (g.g) 
which again holds trivially when 2x < c. 

Merging b and c in u: Recall that merging b and c in cr is coupled with 
splitting b + c into {r,b + c — r} in r, where each split in r occurs with the 
probability that it has not already been coupled with a split of 6 or c in a. 
Thus, in this case, 

Xi = {ai,...,am,b + c) 
Yi = (ai, ...,am,r,b + c-r) 

Assuming as usual that r < s{Xi,Yi) = r. Now calculate the probability 
that r < X. Define 

Pr — V{b and c merge in tt, 6 + c splits into {r, 6 + c — r} in a} 

and bound Pr for various values of r. Consider three different cases: 
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• 7" < 2 • I'^ this case, splitting b + c into {r,b + c — r} in t is coupled with 
both spliting b into {r, b — r} in cr and with splitting c into {r, c — r} in cr. 
Thus, 

• I < < f : 111 this case, splitting b + c into {r, + c — ?-} in r is coupled 
with splitting c into {r, c — r} in a. Thus, 

2(b + c) 2c 2b 

Pr < „ ^ = ^ 



• I < 7": In this case, splitting b + c into {r, 6 + c — r} in r isn't coupled with 
any splits in a. Hence, 

2(6 + c) 



Therefore, the reasoning above shows 

^'•< f =^ ^^<;^ (8-10) 

2b+2c c ^ „ ^ b+c 

where the right-hand inequality uses the fact that b < c. Therefore, 



Ar x{x — 1) 



f{s{Xi,Yi) <x,b and c merge in a} = ^ < ^ — = 4 ^ ^ 

r=l r— 1 

< (au) 

Adding Equations ((5^ and (|5TT|) gives 

P{s(Xi,Yi) <:e} < — 

as required. □ 
Lemma 1291 (Restatement). // x and y satisfy x<b<x + y<c, and 

\Voiy)\ > R, 

1 2b(R~3x~3y) 
PMXuYi) > X + y} > ^ ^ 

Proof: Just like in Lemma [27l consider the two possibilities that 

(7= (ai,...,a,„,6, c) (8-12) 
T = (ai, . . . ,am, b + c) 



38 



and that 

a = {ai, . . . ,ajn,b + c) (8.13) 
T = (ai, . . . ,am, b, c) 

since Vt{y) depends on Xt and not on Yf. As in the previous lemma, in both 
cases ((57r2|) and ([51^ . 

^a^>|"^^)(y)|-(fo + c)>i^-(6 + c) (8.14) 

ai>y 

so case (|8.12p may be assumed. Identical arguments wiU apply for (|8.13p . 

There are two possible ways to have s{Xi,Yi) > x + y: either b can merge 
with an ai > y in cr, or & and c can merge in a, while b + c can be split into 
{r, 6 + c — rjinr, where r > x + y. Consider those cases separately. 



Merging b and > y in a: Note that if b and are merged in ct, then 

Xi = {a[,...,a'^_j^,b + ai,c) 
Yi = (a'l, . . . , a^_i, & + ai+c) 

where {a'l, . . . ,a'^^_i} = {ai, . . . ,am,}/{ai}. Therefore, s{Xi,Yi) = min(6 + 
flj, c). Since b > x and ai>y,b + ai>x + y. By assumption, c > x + y^ and 
so s{Xi,Yi) > X + y. 

The probabihty of b merging with a particular is , and using the bound 
in Equation (|8.14p . 

26ai 26 ■^-^ 

P{s(Ai, Ji) > X + y,b merges with some at m a\ — > — — — ^ } ai 

ai>y f^i^y 

> ^M^^^ "(8.15) 

Merging b and c in cr: If c < 2x + 2y, it will later show that the above bound 
in Equation (|8.15p suffices. Therefore, for this case, assume that c > 2x + 2y. 
Consider the probability of merging b and c in cr, while splitting 6 + c in r into 
{r, 6 + c — r}, where r > x + y. 

Let Pr be defined as in Equation ()8.9p . Now a lower bound on 

Pj. = Pjmerge & and c in cr} 

— Pjmerge b and c in cr, stay at r} — 

is needed. The above equality follows because merging b and c in cr is always 
either coupled with splitting b + c into {r,b + c — r} in r, or staying at r. Here 
is a lower bound for the right-hand side. 
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To start, c>2x + 2y> 2b. By Equation 

P {merge b and c in cr, stay at r} = min ( p, — 

V n 

where p — P(fe + c split into {6, c} in r, staying at a). Now, since c > 2b, 

p = P(6 + c split into {6, c} in t) — P(c split into {6, c — 6} in a) 
2{b + c) 2c _ 2b 



and hence, since < f , 

P (merge b and c in cr, stay at t} = min | — — | = ^ 

Furthermore, since x + y < |, Equation (|8.10p above implies that if r < x + y 
then Pr < |t, and therefore 

Pimerge b and c in a, stay at r} + Pr < '^^^^ 

Thus, since the probability of merging b and c is 2^, 



, s . , , 26c 2b(x + y) 
P{s(Xi, Yi) > a; + 6 and c merge in cr} = > P,- > ^ ^ „ 

26(c - X - y) 
= 5 (8.16) 

Combining all this information, if c < 2a; + 2j/, then Equation ()8.15p shows that 

p{.(x„ r,) >.+,}> ^^^4^ > ^^(^-3; -3^) 

using the fact that b < x + y. Furthermore, if c > 2a; + 2y, then combining 
Equation ([8T5| and (|8T6l) . 

P{.(x„ r,) >. + ,}> ^^^^(^ + 26(c-^ 

26(P-2a;-2y) 

Hence, in either case F{s{Xi,Yi) > x + y} > '^HR-f-'^v) ^ completing the 
proof. □ 
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